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ABSTRACT 

It  is  shown  that  the  existence  of  a strict  local  minimum  satisfying  the 
constraint  qualification  of  [15]  or  McCormick's  [11]  second  order  sufficient 
optimality  condition  implies  the  existence  of  a class  of  exact  local  penalty 
functions  (that  is  ones  with  a finite  value  of  the  penalty  parameter)  for 
a nonlinear  programming  problem.  A lover  bound  to  the  penalty  parameter  is 
given  by  a norm  of  the  optimal  Lagrange  multipliers  which  is  dual  to  the  norm 
used  in  the  penalty  function. 
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SIGNIFICANCE  AND  EXPLANATION 


. 


! 


Exact  penalty  functions  are  associated  with  a constrained  optimization 
problem  in  such  a way  that  for  finite  values  of  a penalty  parameter  there  is 
a correspondence  between  local  (global)  optimal  solutions  of  the  unconstrained 
penalty  function  and  local  (global)  solutions  of  the  constrained  optimization 
problem.  Such  correspondence  is  important  because  it  can  be  exploited  to 
find  solutions  to  complicated  constrained  optimization  problems  by  solving 
a single  unconstrained  problem. 

In  this  work  it  is  shown  that  under  certain  reasonable  conditions  there 
is  a correspondence  between  solutions  of  the  constrained  optimization 
problem  and  the  unconstrained  optimal  solutions  of  a wide  class  of  exact 
penalty  functions.  Lower  bounds  are  also  given  for  the  penalty  parameter. 


The  responsibility  for  the  wording  and  views  expressed  in  this  descriptive 
summary  lies  with  MRC , and  not  with  the  authors  of  this  report. 
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EXACT  PENALTY  FUNCTIONS  IN  NONLINEAR  PROGRAMMING 


S.-P.  Han  and  O.  L.  Mangasarian 


1.  INTRODUCTION 

We  shall  be  concerned  here  with  the  nonlinear  programming  problem 

minimize  f(x) 

subject  to  g(x)  < 0 (1 

h(x)  - 0 

where  f,  g and  h are  functions  from  Rn  into  R,  RB  and  Rk  respectively.  A 
point  x in  Rn  satisfying  the  constraints  g(x)  <0,  h(x)  =0  is  called  feasible.  A 
feasible  point  x such  that  f (x)  < f (x)  for  all  feasible  x * x in  some  neighborhood 
N(x)  of  x is  called  a local  solution  of  (1.1).  If  f(x)  < f(x)  then  x is  called 
a strict  local  solution  of  (1.1).  We  shall  associate  with  this  nonlinear  programming 
problem  the  following  class  of  penalty  functions 

P(x,a)  :=  f (x)  + aQ(||g(x)+.h(x)||)  (1 

where  a is  a nonnegative  real  number  (g(x)+K  = max{0,g ^ (x)  } , j « l,...,m,  ||  • || 

is  any  fixed  vector  norm  in  Rro+,c,  and  Q is  some  function  from  the  nonnegative  real 
line  R+  into  itself  with  the  following  properties 

2(0)  - 0.  2(0  >0  for  C > 0,  » > 2' (0+)  lim  S^L'.  2.<P>  > 0 . (1 

C-KH  C 

Obviously  the  third  condition  of  (1.3)  is  equivalent  to  2' (0)  being  positive  and 
finite  when  2 is  differentiable  at  0.  Included  in  this  class  of  penalty  functions 
is  the  classical  exact  penalty  function 


(x,o)  s-  f(x)  ♦ a l g.(x)  ♦ I | h , (x)  | 

j-1  5 j-1  5 


which  is  obtained  from  (1.2)  by  setting  2(0  “ C and  using  the  one  norm.  With  some 
exceptions  [1,2)  most  of  the  literature  on  exact  penalty  functions  is  generally  devoted 
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to  this  particular  penalty  function  [9,13,16,21,22]  and  is  mainly  concerned  with  conditions 
that  ensure  that  P^lx.a)  has  a local  (global)  minimum  at  a local  (global)  minimum 
of  (1.1)  for  all  sufficiently  large  but  finite  values  of  a.  The  best  known  among  these 
conditions  is  probably  the  one  due  to  Pietrzykowski  [16]  which  requires  the  linear 
independence  of  the  gradients  of  all  the  equality  constraints  and  of  the  active  inequality 
constraints,  that  is  those  inequalities  satisfied  as  equalities  at  the  point  being 
considered.  One  of  our  principal  results,  Theorem  4.4,  is  more  natural  than 
Pietrzykowski' s result  which  it  subsumes . It  is  more  natural  because  it  merely  requires 
the  constraint  qualification  of  [15] . This  constraint  qualification  besides  ensuring  the 
satisfaction  of  the  Karush-Kuhn-Tucker  conditions  at  local  minima  of  (1.1)  has  been 
shown  to  be  a necessary  and  sufficient  condition  for  the  constraints  of  (1.1)  to  be 
stable  under  small  perturbations  [19].  In  this  sense  this  constraint  qualification 
may  be  viewed  as  the  minimum  requirement  for  a problem  to  be  numerically  well-posed. 

Oir  generalization  of  the  penalty  P^x,^  to  the  class  P(x,a)  is  not  merely  general- 
ization for  its  own  sake  but  in  order  to  allow  us  to  handle  other  norms  in  (1.2)  and 
in  particular  the  infinity  and  two  norms  which  we  will  make  use  of  elsewhere  [7]  to  obtain 
improved  quasi-Newton  computational  algorithms  [4,5,6,18].  We  also  note  that  the 

classical  exterior  penalty  function  [3],  which  can  also  be  obtained  from  (1.2)  by  using 

2 

the  two  norm  and  letting  Q(C)  = C > violates  however  the  requirement  (1.3)  because 
Q'  (0)  * 0.  This  is  as  expected  because  it  is  well  known  that  for  the  classical  exterior 
penalty  function  the  penalty  parameter  a is  not  finite.  (See  however,  an  interesting 
exception  to  this  for  linear  programs  in  [1]  and  references  therein.)  Using  insteed 
Q(C)  » C or  Q(C)  « C ♦ C2  with  the  two  norm  would  however  result  with  an  exact 
penalty  function  which  would  again  be  nondifferentiable. 

Because  of  the  significant  role  played  in  this  paper  by  the  constraint  qualifica- 
tion of  [15],  Section  2 of  this  paper  will  be  devoted  to  the  derivation  of  an  equivalent 
statement  of  this  constraint  qualification  which  will  be  used  in  deriving  one  of  our 
principal  results,  Theorem  4.4.  Section  3 is  devoted  to  second  order  sufficient 
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optimality  conditions  which  also  play  an  important  role  in  establishing  the  existence 
of  exact  penalty  minimum  points.  In  particular  we  derive  a second  order  sufficient 
optimality  condition  of  the  Fritz  John  type  (Theorem  3.1)  which  subsumes  McCormick's 
well  known  second  order  sufficient  optimality  condition  (3,11) . We  also  give  an 
equivalent  formulation  (3.6)  of  McCormick's  second  order  condition  (3.9)  which  may 
be  used  to  derive  second  order  optimality  conditions  for  quadratic  programing  without 
any  knowledge  of  the  optimal  Lagrange  multipliers  (Corollary  3.6).  Section  4 contains 
our  principal  results  pertaining  to  the  class  of  exact  penalty  function  P(x,a)  defined 
by  (1.2).  Theorem  4.1  shows  that  the  existence  of  an  exact  penalty  function  minisium 
point  implies  the  existence  of  a minimum  point  to  the  nonlinear  programing  problem 
(1.1).  Theorem  4.2  establishes  the  equivalence  of  local  minima  of  the  class  of  exact 
penalty  functions  defined  by  (1.2).  Theorem  4.4  shows  that  for  sufficiently  large 
but  finite  a,  P(x,a)  has  a local  minimum  point  at  any  strict  local  minimum  point  x 
of  (1.1)  which  satisfies  the  constraint  qualification  of  (151.  In  Theorem  4.6  we 
show  that  McCormick’s  second  order  sufficiency  conditions  imply  that  P(x,a)  has  a 
strict  local  minimum  for  all  values  of  the  penalty  parameter  a that  are  larger  than 
a constant  times  a norm  of  the  optimal  Lagrange  multipliers.  This  norm  is  dual  to  the 
norm  used  in  the  definition  of  the  exact  penalty  function  (1.2).  In  Theorem  4.7  we 
show  that  the  existence  of  a local  minimum  of  P(x,o)  for  all  sufficiently  large  a 
implies,  under  suitable  assumptions,  the  satisfaction  of  the  Karush-Kuhn-Tucker  condi- 
tions (10)  for  problem  (1.1).  In  our  final  theorem.  Theorem  4.8,  we  treat  the  convex 
case  and  again  establish  the  fact  that  the  generalized  Slater  constraint  qualification 
(12)  implies  that  P(x,a)  has  a global  minimum  for  all  values  of  the  penalty  parameter 
larger  or  equal  to  the  lower  bound  established  in  Theorem  4.6. 

To  simplify  notation  a vector  is  either  a row  or  a column  vector  depending  on 

the  context.  For  example,  the  inner  product  of  two  vectors  x and  y is  written 

T 

simply  as  xy  rather  than  x y. 


« 
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i valent  Forms  of  the  Constraint  Qualification 


*te  begin  by  recalling  the  following  definition  of  the  constraint  qualification 


2.1.  Definition  [12,15].  Let  g(x)  < 0,  h(x)  - 0 and  I 


The  constraints  g(x)  < 0,  h(x)  » 0 are  said  to  satisfy  the  constraint  qualif ication 


of  [IS]  at  x if  g is  differentiable  at  x,  h is  continuously  differentiable  at  x 


k,  jure  linearly  independent 


and,  there  exists  a z t R such  that 


It  can  be  shown  by  using  theorems  of  the  alternative  [12]  that  (2.1)  is  equivalent 


to  the  following  condition 


There  exist  no  u 


k such  that 


deriving  our  exact  penalty  results 


2.2.  Theorem  (Constraint  Qualification  Equivalence).  Let  g(x)  < 0,  h(x)  « 0, 

I “ (i|g^(x)  « 0,  i = l,...,m)  and  let  g and  h be  continuously  differentiable  at  x 
The  constraint  qualification  (2.1)  is  satisfied  at  x if  and  only  if  there  exists 


an  open  neighborhood  N(x;c)  of  x such  that 


For  each  bounded  function  b(x)  : N(xtc)  •*  R 


there  exists  a bounded  function  d(x)  : N(x;c)  ■+  R 


such  that  for  all  x in  N(x;e) 


Vh.  (x)d(x)  - b.  (x) 


Proof  (2.3)  — > (2.1):  Just  set  b(x)  - 0 and  x - x in  (2.3)  and  note  that  for 


each  b in  Rk , Vh^xjz  » b^,  i “ l,...,k,  has  a solution  z in  Rn. 

(2.1)  “>  (2.3):  Because  Vh^x),  i “ l,...,k  are  linearly  independent  it  follows 
that  k < n.  Choose  n - k vectors  in  Rn,  w1,*2, . . . ,wn_>c  such  that 
(Vh^tx) > . . . ,Vh^(x) ,w  ,...,w  } are  linearly  independent.  Define  the  n x n matrix 

function  A(x)  as  follows 

" Vhx (x) 


A(x) 


V^U) 


n-k 


Since  A (x)  is  nonsingular  there  exists  an  c > 0 such  A_1(x)  exists  and  is  bounded 
in  N(x»c).  By  (2.1)  there  exists  a vector  z in  Rn  such  that 

Vgi(x)z  <0,  i « I 

Vlu  (x)  z « 0,  i - 1, . . . ,k  . 

Define  z(x)  “A  ^(xjc  where 


1_ 
w z 


n-k_ 
w z 


t R 


Clearly  z(x)  « z and  z(x)  is  continuous  in  N(x;c).  Thus  we  can  shrink  c>  if 
necessary,  so  that 

Vgi(x)z(x)  < - for  x t N(x;e)  and  i t I 

where  -y  - max{Vg  (x)z)  < 0.  Me  also  have  that 
it  I 1 


— 

Let  b(x)  be  any  given  bounded  function  from  N(x;c)  into  R 


let 


b(x)  - 
H(x;  e) 


r *»««)! 


« R 


and  let 


0 J 

and  furthermore 


y(x) 


A (x)  ^bfx).  The  function  y(x)  is  bounded  in 


Let 


VJutxJyfx)  “ bi(x),  i - l,...,k  . 


d (x)  - 6x(x)  + y (x) 

where 

6 „ 2t—  - .XJ_  and  \ = max  Sup  (Vg^xjytx)}  . 

Y it  I xtN(x;c) 

Hence  d(x)  is  bounded  and  satisfies  (2.3).  ° 

We  note  that  the  more  stringent  constraint  qualification  used  by  Pietrzykowski  [16] , 
namely  that  the  gradients  Vgi (x) , i t I,  V)^ (x) , . . . , V)^ (x) , are  linearly  independent, 
implies  the  constraint  qualification  (2.2)  and  hence  its  equivalents  (2.1)  and  (2.3). 


3 . Second  Order  Sufficient  Optimality  Conditions 


We  first  derive  in  this  section  a second  order  sufficient  optimality  condition 
of  the  Fritz  John  type  for  problem  (1.1)  which  subsumes  the  standard  second  order 
sufficiency  condition  of  McCormick  [11] . 

3.1.  Theorem  (Generalized  Second  Order  Sufficiency).  Let  x be  a local  solution 
of  (1.1)  or  let  the  (x,u0,u,v)  e Rn+l+m+k  satiSfy  tj,e  fritz  John  necessary  optimality 
conditions  for  problem  (1.1) 


u Vf(x)  + l u.Vg. (x)  + £ v.Vh  (x) 

i-1  1 j-1  3 3 


(u0'U>  i °'  <uo’u'v>  * 0 
ug(x)  = 0,  g (x)  < 0,  h(x)  **  o 


(3.1) 


Let  f,  g and  h be  twice  differentiable  at  x,  let  X * (i|g^(x)  - 0,  i - l,...,m) 
and  let 

Vf(x)x  <0  \ 

7gi(x)x  <0,  i e I \ — > xV11L° (x,u0,u,v)x  > 0 (3.2) 

Vh^(x)x  = 0,  i «=  l,...,k  / 
x * 0 


where 


(x,u0>u,v) 


uQf(x)  + ug  (x)  + vh(x) 


(3.3) 


and  V^L  (x,u0,u,v)  denotes  the  n x n Hessian  of  L(x,u0,u,v)  with  respect  to 
its  first  argument  x.  Then  x is  a strict  local  minimum  of  (1.1). 

Proof . We  shall  assume  that  x is  not  a strict  local  minimum  of  (1.1)  and  exhibit  a 
contradiction.  Since  x is  assumed  not  to  be  a strict  local  minimum  of  (1.1),  there 
exists  a sequence  of  feasible  points  {x3},  that  is  g(x^)  < 0 and  h(x^)  - 0, 
converging  to  x,  such  that  f(x3)  < f(x)  and  x^  * x.  Hence 
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I 


such  that 


t3  - x| 


Hence  there  exists  an  accumulation  point  s of  the  sequence  {s3}  :=  “f 

Mix3  - x|P 


I s II  = 1,  Vf(x)s  < 0,  V gi<x)s  < 0,  i ( I,  Vh^xjs  » 0,  i = 1, 


<3.4 


Making  use  of  the  twice  differentiability  property  now  gives 

■1-5  - J,  r,!  II  v-3  _ Z 


f(xj) 

- CM 

Vf (x) s j | 

||xj 

-ill2  " 

l|xj  - i|| 

gi(x^) 

- ^(x) 

Vgi(x)sj 

l|xJ 

- ill2 

’ Hxj-i|| 

hMx^) 

- h^fx) 

Vh  <x)s3 

Hxj 

-ill2 

l|xj  - *|| 

Ixj  - x| 


lx3  - x| 


i e I 


a,Vh.Wsj  + 

2 1 l|x3  - x|| 


i = 1, 


Multiplication  of  the  above  relations  respectively  by  uQ,  u^,  i * I,  v^,  i = 
summing  and  making  use  of  the  first  equality  of  the  Fritz  John  conditions  (3.1)  which 
must  hold  when  x is  a local  solution  of  (1.1)  (12,151  gives 


J 


0 :j'V«:v-;)sJ  + 0< 


Hence  the  accuoulation  point  s of  {s3}  satisfies 


sVuL(x,u0,u,v)s  < 0 


This  inequality  together  with  (3.4)  contradict  (3.2). 

We  state  now  a paraphrase  of  McCormick's  second  order  sufficient  optimality 
conditions  which  may  have  certain  advantages  over  the  standard  way  [3,11]  these 
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f 


conditions  are  stated.  We  will  show  that  the  paraphrase  and  standard  statements  are 

equivalent,  and  we  discuss  below  seme  of  tha  advantages  of  the  paraphrase. 

3.2.  Theorem  (Paraphrase  of  McCormick's  Second  Order  Sufficiency).  Let 
,-  - - „ n+m+k  . _ 

tx,u,v)  * R satisfy  the  Karush-Kuhn-Tucker  necessary  optimality  conditions  for 

problem  (1.1) 


Vf(x)  + l u.Vg.OO  + l v.Vh.(x)  - 0 

j-1  3 3 j-1  3 3 

u > 0,  ug (x)  - 0,  g (x)  < 0,  h(x)  - 0 

Let  f,  g and  h be  twice  differentiable  at  x,  let  I - { i |gi (sc)  « 0,  i « 1, . . . ,m} 
and  let 


Vf (x)x  < 0 

s* 

Vgi(x)x  < 0, 
Vhi (x)x  = 0, 
x * 0 


i « I 
i » 1. 


“>  *V  L(*fU.v)*  > 0 


(3.6) 


where 


L(x,u,v)  = f(x)  + ug(x)  + vh(x)  . (3.7) 

Then  x is  a strict  local  minimum  of  (1.1). 

3.3.  Remark.  Theorem  3.1  subsumes  Theorem  3.2  because  whenever  the  Karush-Kuhn-Tucker 

con<^^f^ons  (3.5)  are  satisfied,  so  are  the  Fritz  John  conditions  (3.1)  with  u ■ 1. 

n 

The  following  simple  example  shows  that  there  are  indeed  cases  which  are  covered  by 
Theorem  3.1  and  not  by  Theorem  3.2: 

minimize  x^ 

subject  to  - x2  < 0 (3.8) 

2 

x,  + x,  < 0 . 

1 2 ■> 

The  origin  in  R2  is  the  only  feasible  point  and  hence  is  a strict  local  solution. 

Theorem  3.1  can  be  used  to  verify  the  uniqueness  of  the  solution  because  the  Fritz  John 
conditions  are  satisfied,  whereas  because  the  Karush-Kuhn-Tucker  conditions  are  not 
satisfied,  Theorem  3.2  cannot  be  employed.  The  same  example  (3.8)  can  be  used  to  show 
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that  the  origin  is  not  a local  minimum  of  P^Cx.a)  as  defined  in  (1.4)  for  this 
problem.  Hence  the  second  order  Fritz  John  conditions  cannot  guarantee  the  existence 
of  a local  minimum  for  P^(x,a) . He  Hill  show  however  in  Theorem  4.6  that  McCormick's 
second  order  sufficient  optimality  conditions  are  sufficient  to  ensure  that  all  exact 
penalty  functions  as  defined  by  (1.2)  have  a strict  local  minimum. 

3.4.  Remark.  The  standard  way  of  stating  the  second  order  sufficiency  condition  is 
to  replace  the  implication  (3.6)  by  the  following  equivalent  one 


Vgi(x)x  - 0,  i * J ' 
Vgi<x)x  <0,  i t K 

VhMxJx  »0,  i = l,...,k 


— > xV11Mx,u,v)x  > 0 


where  J and  K are  the  following  subsets  of  I 

J « (i|gi(x)  - 0,  uA  > 0,  i - 1, 


K = (i|gi(x)  » 0,  uA  ■ 0,  i - 1, — ,m}  . 

That  implication  (3.9)  is  equivalent  to  implication  (3-6)  can  be  easily  established 
as  shown  by  the  following  theorem. 

3.5.  Theorem  (Equivalence  of  (3.6)  and  (3.9)).  Under  the  assumptions  of  Theorem  3.2 
implications  (3.6)  and  (3.9)  are  equivalent. 

Proof.  We  will  show  that,  under  the  assumptions  of  Theorem  3.2,  the  sets  S and  T 
in  Rn  satisfying  the  conditions  on  the  left-hand  side  of  implications  (3.6)  and 
(3.9)  respectively  are  equivalent. 

He  first  show  that  S c T.  He  assume  that  S is  nonempty,  otherwise  the  implica- 
tion is  trivially  true.  Let  x be  in  S.  Clearly,  we  only  need  to  show  that  for 
j e j,  Vg ^ (x) x - 0.  By  (3.5)  we  have  that 


Vf (x)x  + l u vg.(x)x  + l v Vh. 

j«l  3 3 j-1  3 3 


(x) X - 0 


Because  Vh^(x)x  • 0 for  j • l,...,k  and  u^  ■ 0 for  j * K,  we  have 

VfU)x  + 7 u.Vg . (x)x  - 0 . 


Because  each  term  in  the  above  equation  is  nonpositive  and 
then  have 


> 0 for 


j « J,  we 


Vg^(x)x  “ 0 for  j e J . 

We  now  prove  that  T C s.  Again  we  assume  that  T is  nonempty  and  let  x be 
any  point  in  T.  It  suffices  to  show  that  Vf(x)x  < 0.  As  before,  we  have 


k 

Vf  (x)x  + l u ,Vg  . (x)x  + £ v.Vh.  (x)x  - 0 . 

jei  33  j-1  3 3 

Clearly  Vf(x)x  = 0 because  all  the  other  terms  are  zeros.  The  proof  is  then  complete.  ° 
We  give  now  an  interpretation  of  the  implication  (3.6).  The  set  of  x in  Rn 
satisfying  the  left-hand  side  conditions  of  (3.6)  can  be  seen  (14)  to  be  the  set  of 
directions  along  which  the  linearized  problem,  obtained  by  linearizing  (1.1)  around 
x,  has  nonunique  solutions.  In  order  to  have  uniqueness  for  the  nonlinear  problem, 
implication  (3.6)  requires  that  the  Hessian  of  the  Lagrangian  be  positive  definite 
along  these  directions.  Besides  having  this  simple  interpretation,  implication  (3.6) 
is  also  simpler  than  (3.9)  because  the  left-hand  side  conditions  of  (3.6)  do  not 
require  any  information  on  the  multiplier  vector  u whereas  the  corresponding  condi- 
tions of  (3.9)  do.  As  an  example  of  the  usefulness  of  this  fact  we  give  below  a 
sufficient  condition  for  the  existence  of  a strict  local  minimum  point  for  a quadratic 
programming  problem  which  does  not  require  the  knowledge  of  any  of  the  multipliers. 

3.6.  Corollary  (Sufficient  Conditions  for  a Strict  Local  Minimum  in  Quadratic 
Programming) . Let  x be  a local  solution  of  the  quadratic  program 

minimize  -j  xQx  + px 

subject  to  Ax  < b (3.11) 


Cx  - d 

where  Q,  A and  C are  n * n,  m * n and  k * n matrices  respectively  with  Q 
symmetric,  and  p,  b and  d are  vectors  in  Rn,  RB  and  Rk  respectively.  Let 
I « (i|Aix  » bi#  i * l,...,m).  If 
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4.  Exact  Penalty  Functions 


He  derive  in  this  section  our  principal  results  which  relate  local  (global)  solu- 
tions of  the  penalty  function  (1.2)  to  local  (global)  solutions  of  the  nonlinear 
programming  problem  (1.1).  Our  vehicle  for  deriving  many  of  the  results  of  this 
section  will  be  the  classical  exact  penalty  function  P^(x,a)  defined 
in  (1.4).  But  because  we  wish  to  establish  these  results  for  the  more  general  penalty 
function  of  (1.2)  we  establish  an  important  equivalence  between  members  of  the  class 
of  penalty  functions  given  by  (1.2)  in  Theorem  4.2  below.  Before  doing  this  we 
establish  the  sufficiency  of  the  existence  of  an  exact  penalty  minimum  point  for  the 
existence  of  a minimum  point  to  the  nonlinear  programming  problem.  This  theorem  was 
given  in  (13]  without  proof. 

4.1.  Theorem  (Sufficiency  of  Exact  Penalty  Minimum).  If  there  exists  an  a > 0 such 
that  for  all  a > a,  p(x,a)  < P(x,a)  for  all  x in  some  set  Y containing  x and  some 
feasible  point  of  (1.1),  then  x solves  (1.1)  subject  to  the  extra  condition  that  x t Y. 
Proof.  He  first  show  by  contradiction  that  x must  be  feasible  for  problem  (1.1). 

If  x is  infeasible  then  Q(  ||g(x)  + ,h(x)||  ) > 0.  Choose  any  feasible  point  x which 
is  also  in  Y and  let 


He  then  have 


a > max 


f ( x ) - f(x) 

. Q(  l|g(*)  + »h(x)|| 


f(x)  - P(x,a)  > P(x,a)  - f(£)  + aQ(  ||g(x)  + ,h(x)||  ) > f(x) 
where  the  last  inequality  follows  from  the  choice  of  a.  This  gives  a contradiction 
and  hence  x is  feasible  for  (1.1)  . To  show  that  x is  optimal  for  (1.1)  let  x 
be  any  other  feasible  point  for  (1.1)  which  is  also  in  Y and  let  a > a.  Then 

f(x)  • P(x,a)  < P(x,a)  - fix) 

and  hence  x solves  (1.1)  with  the  added  restriction  that  x « Y.  D 

He  show  now  that  local  minima  of  exact  penalty  functions  of  the  class  given  by 
(1.2)  are  equivalent. 
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4.2.  Theorem  (Equivalence  of  Local  Minima  of  Exact  Penalty  Functional.  Let 


and 


denote  two  vector  norms  in  R 


m+k 


and  let  the  corresponding  exact  penalty 


functions  defined  by  (1.2)  be  denoted  by  P^(x,o)  and  P^(x,a)  with  corresponding 


and  satisfying  (1.3).  If  there  exists  an  x in  R , an  > 0,  and  a 


neighborhood  N (x)  of  x containing  some  feasible  point  of  (1.1)  such  that  g and  h 


are  continuous  on  N (x)  and 
U 


Py(x,o)  < Py(x,a)  for  all  x * Ny(x)  and  o > 


then  there  exists  an  > 0 and  a neighborhood  (x)  containing  some  feasible  point 


of  (1.1)  such  that 

P (x,a)  < P (x,a)  for  all  x c N (x)  and  a > a 


where 


i + E 


1 - E Q1 (0+) Y 

yv 


for  any  e e (0,1) 


„m+k 


and  Yyv  is  the  positive  number  relating  the  y-norm  and  the  v-norm  in  R by 


Y < 
yv  • 


v - , , . m+k 

— for  all  nonzero  y e R 


Proof.  By  Theorem  4.1,  x is  feasible  for  problem  (1.1).  Choose  e e (0,1)  and 
t > 0 such  that 


(1  + C)Q^(0+)t  > Qy(t)  and  Q^t)  > (1  - c)Q^(0+)t  for  t(  (0,t] 


and  choose  N^(x)  C M^(x)  sufficiently  small  such  that  ||g (x)  + ,h(x)  || v < t and 


|g(x)^,h(x)  ||^  < t for  x * N^fx).  This  is  possible  because  g(x)+  ■ 0,  h(x)  ■ 0 


and  g and  h are  continuous  on  (x) . Note  that  N^tx)  contains  a feasible  point 


to  problem  (1.1),  naswly  the  point  x itself.  For  any  a > a and  x « N^tx)  we  have 
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) 


■ 


Pv(x,a)  > f(x)  +acu(||g(x)  + , h(x)  ||v) 

> f(x)  + a (1  - ||g(x)  + . h(x)  | 


yv(1 

f. 

- e)Q^(0+)  ||g(x) 

_ E] 

| (0+) 

pv(r 

♦ ej 

1 Qy(0+)  V 

■ fi 

- e 

| 2v<0+) 

I Q’(0+)  V 

uv(l 

+ e ( 

i f(xl  + Vl|q<x,+'h(x>  V 

i f(x)  + %v(rK]^  Vll9<;Vh<i>llw> 

- f(x> 

- f(x)  + aQv(||g(x)  + ,h(x)  ||v) 

“ Pv(x,a)  . a 

To  prove  our  next  principal  result,  namely  that  when  a strict  local  minimum  of 
(1.1)  exists  satisfying  the  constraint  qualification  of  [15],  a local  minimum  to  the 
exact  penalty  function  P(x,a)  exists,  we  make  use  of  the  following  result  due  to 
Pietrzykowski . 

4.3.  Lemma  (Pietrzykowski  [17]).  Let  f,  g and  h be  continuous  on  a neighborhood 
of  x and  let  x be  a strict  local  minimum  point  of  problem  (1.1).  There  exists  a 
number  a > 0 such  that  for  any  a > a there  exists  a positive  number  e(o)  and 

a vector  x(a)  in  Rn  such  that 

(i)  x(a)  e N (x;  c (a) ) 

(ii)  lim  e(a)  = 0 

(iii)  P(x(a),a)  < P(x,a)  for  all  x«  N(x;c(a))  . 

4.4.  Theorem  (Strict  Local  Minimum  and  Constraint  Qualification  Imply  Local  Minimum 
of  Exact  Penalty).  Let  f,  g and  h be  continuously  differentiable  on  a neighbor- 
hood of  a strict  local  minimum  point  of  x of  (1.1)  and  let  the  constraint  qualifica- 
tion (2.1)  hold  at  x.  Then  for  each  norm  ||  • ||  in  R*+k  there  exists  an  a > 0, 
such  that  for  all  a > a,  x is  a local  minimum  of  P(x,a),  where  P(x,a)  is  defined 
in  (1.2)  with  Q satisfying  (1.3). 

Proof.  We  will  establish  the  result  for  P^(x,a)  of  (1.4)  and  the  theorem  will  follow, 
by  virtue  of  Theorem  4.2,  for  all  other  P(x,a)  defined  by  (1.2)  with  Q satisfying 
(1.3). 


Let  x be  a strict  local  minimum  of  (1.1)  in  the  neighborhood  N(x,-c).  if 
I - (i|g^(x)  - 0,  i - l,...,m)  is  empty  and  there  are  no  equality  constraints  h(x)  = 0, 
the  theorem  is  trivially  true.  So  asstae  that  I is  nonempty  or  there  exists  at 
least  one  equality  constraint.  By  Last  4.3,  for  all  sufficiently  large  a,  there 
exist  £(a)  >0  and  x(a)  such  that  x(a)  is  a local  minimum  point  of  P^lx.a) 
in  N(x;t(a))  and  lia  c(a)  • 0.  Let  a be  sufficiently  large  such  that  c (a)  < c. 

Q-*“ 

If  for  such  an  a the  point  x(a)  is  feasible  for  problem  (1.1),  then  by  Lemma  4.3 

f(x)  - P1(x,o)  > P^xfcO.a)  » f(x(a))  . 

Because  x is  a strict  local  minimum  of  (1.1),  we  then  have  that  x « x(a)  and  hence 
x is  a local  minimum  of  P(x,a).  Therefore,  to  complete  the  proof  we 

only  need  to  show  that  x(o)  is  feasible  for  all  sufficiently  large  a.  He  shall 
assume  the  contrary,  that  is  there  exists  a sequence  of  positive  numbers  {or  ) •*  - 
such  that  x(cr)  is  infeasible  for  problem  (1.1),  and  exhibit  a contradiction.  Let 
a neighborhood  N(x;c)  be  defined  as  in  Theorem  2.2  and  consider  the  bounded  function 
b(x)  s N(x»e)  •*  Rk  defined  by 


bi(x) 


-hi(x)/|hi(x)  | 
0 


By  Theorem  2.2  there  exists  a bounded  function 
all  x t N (x ; e ) 


if  h^  (x)  * 0 
if  hi(x)  « 0 

d(x)  : N (x; c) 


Rn  such  that  for 


Now  choose  e (0 , e ) 
have  for  x * N(xje^) 
tive  for  P1(x,a)  of 


Vg^(x)d(x)  < -1, 


i e I 


VhA(x)d(x) 


-1  if  h^x)  > 0 

0 if  hi(x)  - 0 

1 if  hi(x)  < 0 . 

such  that  gi(x)  < 0 for  x * N(xiEj)  and  i 1 I.  We  then 
and  x infeasible  for  (1.1)  the  following  directional  deriva- 
(1.4)  in  the  direction  d(x) 
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P*  (x,a;d(x) ) • Vf(x)d(x)  + a J Vg  (x)d(x)  +a  £ (Vg  (x)d(x)) 

^ _ f„t  vA  1 _ /„I_A  1 


gAx)>0 


gLU)-0 


+ a l Vh.  (x)d(x)  + o l -Vh  (x)d(x)  +a  £ |Vh.(x)d(x>| 

hi(x)>0  1 h^xHO  h.(x)-0  1 

< ||Vf(x)||2||d(x)||2-  a . 


h.  (x)-0 


Hence  P^  (x  (cO  ,a^;d(x  (aj ) ) < 0 for  cr  sufficiently  large.  This  contradicts  the 
tact  that  xlcO  is  a local  minimum  of  P (xfa^) . This  contradiction  establishes  the 
theorem  for  P^fx.a)  and  consequently  for  all  P(x,a)  of  (1.2)  with  Q satisfying  (1.3).° 
He  establish  next  the  existence  of  a strict  local  minimum  of  the  exact  penalty 
function  at  each  strict  local  minimum  of  problem  (1.1)  which  satisfies  the  second 
order  sufficient  optimality  conditions  of  Theorem  3.2.  In  addition  we  are  able  under 
these  assumptions  to  give  a lower  bound  to  the  penalty  parameter  a.  We  begin  by 
establishing  a lemma. 

4.5.  Lemma.  Let  the  assumptions  of  Theorem  3.2  hold.  Then  for  any  fixed  (u,v)  t R 
such  that  u > u and  v > |v|,  x is  a strict  local  minimum  of  the  following  function 

m k 

#(x,u,v)  :=f(x)  + [ u.g.(x)  + 1 v |h  (x)  I . 

i-1  j-1  3 3 

Proof . If  the  lemma  were  false,  then  there  exists  a sequence  {x3}  converging  to  x 


such  that  x3  * x and 


* (xJ,u,v)  < * (x,u,v) 


f(x^)  - f(x)  ♦ l uigi<x^+  ^ vjh.  (x1)! 


By  passing  to  a subsequence,  if  necessary,  we  have  a vector  s with 
xJ  - x 


that  s - lim 

J-"» 


Therefore 


Vf(x)s  + l u (Vg  (x)s)  ♦ l v.  | Vh.  (x)s|  <0 

is  1 1 i-1  11 


where  I ■ (ilg^x)  - 0,  i - Since  (x,u,v)  satisfy  the  Karush-Kuhn-Tucker 

conditions  (3.5)  we  also  have  that 


I [u.(7g  (i)s)  - G 7g  (x)s)  + J [v. |7h . (x)  ■ ) -v.7h.(x)s)  <0  . 

if  I fii  i-1  1 1 11 

Because  u > u and  v > | v | , each  tern  in  the  above  sumation  is  nonnegative  and 

hence  zero.  Thus  it  follows  that 


7gi(x)s 

- 0 

for 

i « 

i 

and 

“i 

> 0 

7gi(x)s 

< 0 

for 

i t 

i 

and 

“i 

- 0 

7hi(x)s 

*=  0 

for 

i - 

i. 

• • • ik 

• 

By  the  second  order  sufficiency  condition  (3.6)  or  equivalently  (3.9)  it  follows  that 
s7^lL(x,u<v)s  > 0.  This  implies  that  for  sufficiently  large  j that 


L(x^,u,v)  > L(x,u,v 


and  consequently 


<f  (x^  ,u,v)  > f (X^  ,u  ,v) 

> L(x^,u,v) 

> L(x,u,v) 

- f(x) 

» If  (x,u,v) 

> *(x^,u,v) 

which  is  a contradiction.  Hence  the  leeoa  is  true.  ° 

To  establish  a lower  bound  for  the  penalty  parameter  we  need  the  concept  of  dual 
norm . Recall  that  for  any  given  vector  non  ||  • ||  in  R*  there  is  a corresponding 
vector  non  ||  • ||  , called  the  dual  non  of  ||-||/  which  is  defined  by 

||x|| ' - ..sup  yx  . 


Recall  also  that  if  ■ > p,  q > 1 and  i + i-  - 1 then  for  any  z in  Rl  the  p-non 
l _ 


([r||  *■  ( J |*.|P)  'P  and  the  q-nonn  || a||  are  dual  to  each  other.  Por  a 

P i*l  1 q 

positive  definite  and  symmetric  i * l matrix  A we  may  define  a vector 


i 


r 


norm  llzllA  fay  llz|lA  “ UAz)1^2.  The  dual  norm  of  II  • llA  is  II  • II  .j/  *cr  ® 
detailed  discussion  on  the  duality  of  norms  readers  are  referred  to  Rockafellar  [20, 
Chapter  15]  or  Householder  [8,  pp.  39-45) . We  note  here  that  it  follows  from  the 
definition  of  dual  norms  that  if  two  norms  ||  • ||  and  ||  • ||  are  dual  to  each  other 
then  for  any  x and  y we  have 


This  is  known  as  the  generalized  Cauchy  inequality  and  will  be  needed  in  the 
proof  of  the  following  theorem. 

4.6.  Theorem  (Second  Order  Sufficiency  Implies  Strict  Local  Minimum  of  Exact  Penalty) 
Let  the  assumptions  of  Theorem  3.2  hold.  Let  Q satisfy  (1.3)  and  be  convex  on  R+, 
let  ||  • ||  be  any  given  vector  norm  in  Rm+)c  and  P(x,ct)  be  its  corresponding  exact 

f 

penalty  function  defined  as  in  (1.2),  and  let  ||  • ||  be  its  dual  norm.  Then  for 
any  a > a where 

- JM' 

Q' (0+) 

the  point  x is  a strict  local  minimum  of  P(x,a) . 

Proof . For  a satisfying  the  above  inequality  we  can  find  (u,v)  e R such  that 
u > u,  v > | v | and 

aQ'  (0+)  > ||u,v||  > ||u,v|| 

By  Lemma  4.5  there  exists  a neighborhood  N(x)  of  x such  that  for  x * x and  x e N(x) 

t (x,u,v)  < If  (x , u , v) 

where  •f  is  defined  in  the  same  lemma.  Hence  by  the  convexity  of  Q and  by 
the  generalized  Cauchy  inequality  we  have  that  for  any  x c N(x) , a > a 
and  x * x that 

P(x,a)  > f(x)  + aQ' (0+)  ||g(x)  + , h(x)  || 

> f(x)  + || u , v ||  ||g(x)  + , h(x)  || 

m k 

> f(x)  + l u g (x)  + l v |h  (x)| 

j-1  J 1 j-1  •’  3 

> <f  (x,u,v) 

- P(x,a)  . o 


< 


A. 


It  is  interesting  to  note  that  if  O' (0+)  “ i then 


|u<vIL« 


1 2' 


He  establish  now  the  fact  the  Karush-Kuhn-Tucker  conditions  (3.5)  for  problem 
(1.1)  are  under  suitable  conditions  satisfied  at  local  minima  of  P(x,a) . 

4.7.  Theorem.  If  there  exists  an  a > 0 such  that  for  all  a > a,  P(x,a)  < P(x,a) 
for  all  x in  some  open  neighborhood  N(x)  which  contains  some  feasible  point  of  (1.1), 
and  if  f,  g and  h are  differentiable  at  x,  then  x and  some  (u,v)  e Rm  satisfy 
the  Karush-Kuhn-Tucker  conditions  (3.5)  for  problem  (1.1). 

Proof . By  Theorem  4.1  x is  feasible  for  problem  (1.1)  and  hence  g(x)  < 0 and 
h(x)  « 0.  By  Theorem  4.2  x is  a local  minimum  point  of  P^(x,a)  and  consequently 
(x,y  » 0,z  » 0)  * Rn+m+*C  constitute  a local  solution  to  the  problem 


Minimize  f(x)  + a(ey  + iz) 
. „n+m+k 


(x,y,z)  R 


g(x ) - y < 0 
-y  < 0 
h(x)  - z < 0 


-h(x)  - z < 0 


where  e and  l are  vectors  of  ones  in  R and  R respectively.  Note  that  the 

Arrow-Hurwicz-Uzawa  constraint  qualification  (12)  is  satisfied  at  x,  y = 0,  z » 0. 

(In  fact  it  is  satisfied  at  all  feasible  points  of  (4.1)  for  which  g and  h are 

— — — — — m+m+k+k+k 

differentiable.)  Hence  there  exist  (w,r,s,t,q)  « R such  that 

(x,y  » 0,z  “ 0,w,r,s,t,q)  satisfy  the  Karush-Kuhn-Tucker  conditions  for  problem  (4.1) 
which  turn  out  to  be  precisely  the  Karush-Kuhn-Tucker  conditions  (3.5)  for  problem 
(1.1)  upon  making  the  identifications  u - w and  v - s - t.  0 

Using  Theorem  4.7  one  may  interpret  the  existence  of  a local  minimum  to  the  exact 
penalty  function  as  a constraint  qualification  which  ensures  the  satisfaction  of  the 
Karush-Kuhn-Tucker  conditions  at  local  minima  of  (1.1). 


We  sketch  in  Figure  1 an  outline  of  the  relations  obtained  in  this  paper  for 


convenient  reference. 


CQ 


CQ:  Constraint  qualification  of  [15] . 

Strict  Local  Min.:  Strict  local  minimus:  of  problem  (1.1). 

Local  Min.:  Local  minimum  of  problem  (1.1). 

Exact  Penalty  Ix>cal  Min.:  Local  minimum  of  the  exact  penalty  function  (1.2). 

Exact  Penalty  Strict  Local  Min.:  Strict  local  minimum  of  (1.2). 

KKT:  First  order  Karush-Kuhn-Tucker  conditions  (3.5)  for  problem  (1.1). 

FJ2:  Second  order  Fritz  John  conditions  of  Theorem  3.1  for  problem  (1.1). 

KKT2:  Second  order  Karush-Kuhn-TUcker  conditions  of  Theorem  3.2  for  problem  (1.1). 

Figure  1:  Summary  of  Results 
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Our  concluding  result  generalizes  Zangwill's  result  [22]  and  is  restricted  to  the 


convex  case.  As  in  Theorem  4.6  an  estimate  of  the  size  of  the  penalty  parameter  a 


can  be  obtained  in  terms  of  the  optimal  Lagrange  multipliers  of  the  original  problem  (1.1) 


4.8.  Theorem.  Let  x be  a solution  of  (1.1),  f and  g be  convex  on  R and  h 


be  linear.  Let  g(x)  < 0 and  h(x)  >0  for  some  x in  R 


For  any  given  vector 


let  P(x,a)  be  its  corresponding  exact  penalty  function  defined 


as  in  (1.2)  with  Q satisfying  (1.3)  and  being  convex  on  R 


Then  P(x,a)  < P(x,a) 


for  all  x in  R and  a > a where 


is  the  dual  norm  of 


is  convex  on  R we  have  that 


a > a and  any  x t R we  have  that 


< f(x)  + ug  (x)  + vh(x) 


(By  convexity  of  Q) 
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Note  Added  in  Proof 

Just  before  this  report  was  sent  to  the  printer  we  became  aware  of  another  closely 
related  paper 

S.  Dolecki  and  S.  Rolewicz,  "Exact  penalty  for  local  minima",  to  appear, 
in  which  multifunction  theory  was  used  to  derive  exactness  of  the  penalty  function 
P^tx.a)  under  a "controllability  condition"  which  is  equivalent  to  the  constraint 
qualification  (2.1). 
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